On Ideal Binary Mask As the Computational Goal of Auditory Scene Analysis
نویسنده
چکیده
What is the computational goal of auditory scene analysis? This is a key issue to address in the Marrian information-processing framework. It is also an important question for researchers in computational auditory scene analysis (CASA) because it bears directly on how a CASA system should be evaluated. In this chapter I discuss different objectives used in CASA. I suggest as a main CASA goal the use of the ideal time-frequency (T-F) binary mask whose value is one for a T-F unit where the target energy is greater than the interference energy and is zero otherwise. The notion of the ideal binary mask is motivated by the auditory masking phenomenon. Properties of the ideal binary mask are discussed, including their relationship to automatic speech recognition and human speech intelligibility. This CASA goal has led to algorithms that directly estimate the ideal binary mask in monaural and binaural conditions, and these algorithms have substantially advanced the state-of-the-art performance in speech separation.
منابع مشابه
A computational auditory scene analysis system for speech segregation and robust speech recognition
A conventional automatic speech recognizer does not perform well in the presence of multiple sound sources, while human listeners are able to segregate and recognize a signal of interest through auditory scene analysis. We present a computational auditory scene analysis system for separating and recognizing target speech in the presence of competing speech or noise. We estimate, in two stages, ...
متن کاملA Mask Estimation Method Integrating Data Field Model for Speech Enhancement
In most approaches based on computational auditory scene analysis (CASA), the ideal binary mask (IBM) is often used for noise reduction. However, it is almost impossible to obtain the IBM result. The error in IBM estimation may greatly violate smooth evolution nature of speech because of the energy absence in many speech-dominated time-frequency (TF) units. To reduce the error, the ideal ratio ...
متن کاملSingle Channel Speech Enhancement Using Ideal Binary Mask Technique Based on Computational Auditory Scene Analysis
Single channel speech enhancement is necessary where the multichannel speech enhancement is not feasible due to space constraints in the intended device and cost-effectiveness. However, the problem of having limited information from single channel sound signal mixtures or unavailability of the speech source signals makes it more difficult to separate the target speech from the background masker...
متن کاملSupervised Speech Separation and Processing
In real-world environments, speech often occurs simultaneously with acoustic interference, such as background noise or reverberation. The interference usually leads to adverse effects on speech perception, and results in performance degradation in many speech applications, including automatic speech recognition and speaker identification. Monaural speech separation and processing aim to separat...
متن کاملBinaural Source Separation in Non-ideal Reverberant Environments
This paper proposes a framework for separating several speech sources in non-ideal, reverberant environments. A movable human dummy head residing in a normal office room is used to model the conditions humans experience when listening to complex auditory scenes. Before the source separation takes place the human dummy head explores the auditory scene and extracts characteristics the same way as...
متن کامل